home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group98a.txt
/
000131_icon-group-sender _Fri Mar 13 12:35:20 1998.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
3KB
Return-Path: <icon-group-sender>
Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239])
by baskerville.CS.Arizona.EDU (8.8.7/8.8.7) with SMTP id MAA14412
for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Fri, 13 Mar 1998 12:35:20 -0700 (MST)
Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM)
id AA15665; Fri, 13 Mar 1998 12:35:19 -0700
From: eka@corp.cirrus.com (Eka Laiman)
Message-Id: <199803131831.KAA01976@sims-rd.corp.cirrus.com>
Subject: Re: Letter Probabilities
To: evans@gte.net
Date: Fri, 13 Mar 1998 10:27:42 -0800 (PST)
Cc: icon-group@optima.CS.Arizona.EDU
In-Reply-To: <35089F74.6018@gte.net> from "Mark Evans" at Mar 12, 98 08:52:36 pm
X-Mailer: ELM [version 2.4 PL24alpha3]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Errors-To: icon-group-errors@optima.CS.Arizona.EDU
Status: RO
Content-Length: 1684
Mark Evans wrote:
> Here is a small Icon problem related to letter probabilites. Each
> letter has a "probability of occurrence" in the information-theoretic
> setting. This probability can be estimated from sample texts. I want
> to generate random text based on these probabilities.
> ............... rest is removed .........
This is what I would do:
(1) Construct a string of length n where the members of the string
are letters and the number of occurrence of each letter is
according to the probability of letter occurrence. The "space"
character of course part of the alphabet.
For example: consider the character set consists of letter
A (30%), B (20%), C (40%), space (10%). Choose n = 200 (icon
is very good in handling super long string), then the number
of A's will be 60 characters, B's 40 characters, C's 80 characters,
and spaces 20 characters.
Such a string may look (keep in mind that the characters have been
distributed using random permutation - cf. Floyd's algorithm for
random permutation):
BAA CACCB AA CBCCAB .......
(2) Repeat as many times as the length of the random text that need to
be generated:
Produce a random number i between 1 and n (200 in the above example)
and print string[i:i+1]
Floyd's algorithm of generating random permutation from 1 to n:
(1) Fill an array L such that L[i] := i
(2) every i := n to 2 by -1 do {
t := ?i # generate a random index between 1 to i
L[i] :=: L[t] # exchange the t-th element with the last
# element in current sublist
}
I think the above method is quite efficient since the distribution of
the character is done once.
-eka-